Please draw your own subjective distributions for the following events.
- The probability that it will snow at Reed next winter.
- The probability that, on a given night, the sun has gone super nova.
- The total number of individual socks that you own.
Please draw your own subjective distributions for the following events.
\(H_0\): I have \(N_{pairs}\) pairs of socks and \(N_{singles}\) singletons. The first 11 socks that I pull out of the machine are a random sample from this population.
The number of singletons in the sample: 11.
Probability theory or simulation.
Find the p-value if you like.
\[N_{pairs} = 9\]
\[N_{pairs} = 9; \quad N_{singles} = 5\]
We'll use simulation.
Create the population of socks:
sock_pairs <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K")
sock_singles <- c("l", "m", "n", "o", "p")
socks <- c(rep(sock_pairs, each = 2), sock_singles)
socks
## [1] "A" "A" "B" "B" "C" "C" "D" "D" "E" "E" "F" "F" "G" "G" "H" "H" "I" ## [18] "I" "J" "J" "K" "K" "l" "m" "n" "o" "p"
picked_socks <- sample(socks, size = 11, replace = FALSE) picked_socks
## [1] "J" "D" "n" "F" "p" "B" "m" "l" "H" "I" "C"
sock_counts <- table(picked_socks) sock_counts
## picked_socks ## B C D F H I J l m n p ## 1 1 1 1 1 1 1 1 1 1 1
n_singles <- sum(sock_counts == 1) n_singles
## [1] 11
pick_socks(N_pairs = 9, N_singles = 5, N_pick = 11)
## [1] 9
pick_socks(9, 5, 11)
## [1] 7
pick_socks(9, 5, 11)
## [1] 7
Repeat many, many times…
Quantifying how far into the tails our observed count was.
table(sim_singles)
## sim_singles ## 1 3 5 7 9 11 ## 2 48 248 411 250 41
table(sim_singles)[6]/1000
## 11 ## 0.041
Our two-tailed p-value is 0.082.
What is the best definition for our p-value in probability notation?
What is the best definition for our p-value in probability notation?
The result of a hypothesis test is a probability of the form:
\[ P(\textrm{ data or more extreme } | \ H_0 \textrm{ true }) \]
while most people think they're getting
\[ P(\ H_0 \textrm{ true } | \textrm{ data }) \]
How can we go from the former to the latter?
\[P(A \ | \ B) = \frac{P(A \textrm{ and } B)}{P(B)} \]
\[P(A \ | \ B) = \frac{P(B \ | \ A) \ P(A)}{P(B)} \]
\[P(model \ | \ data) = \frac{P(data \ | \ model) \ P(model)}{P(data)} \]
What does it mean to think about \(P(model)\)?
Please draw your own subjective distributions for the following events.
A prior distribution is a probability distribution for a parameter that summarizes the information that you have before seeing the data.
head(sock_sim)
## unique pairs n_socks prop_pairs ## 1 3 4 16 0.970 ## 2 7 2 33 0.914 ## 3 9 1 51 0.929 ## 4 1 4 9 0.955 ## 5 9 1 45 0.851 ## 6 9 1 21 0.726
sock_sim %>% filter(unique == 11, pairs == 0) %>% head()
## unique pairs n_socks prop_pairs ## 1 11 0 49 0.692 ## 2 11 0 37 0.873 ## 3 11 0 49 0.815 ## 4 11 0 62 0.961 ## 5 11 0 53 0.974 ## 6 11 0 59 0.847
What is your best guess for the number of socks that Karl has?
\[ 21 \times 2 + 3 = 45 \textrm{ socks} \]
Bayesian methods . . .